Search CORE

6 research outputs found

Collaborative ranking from ordinal data

Author: Thekumparampil Kiran Koshy
Publication venue
Publication date: 01/05/2017
Field of study

Personalized recommendation systems have to predict preferences of a user for items that have not seen by the user. For cardinal (ratings) data, personalized preference prediction has been efficiently solved over the past few years using matrix factorization related techniques. Recent studies have shown that ordinal (comparison) data can outperform cardinal data in learning preferences, but there has not been much study on learning personalized preferences from ordinal data. This thesis presents a matrix factorization inspired, convex relaxation algorithm to collaboratively learn hidden preferences of users through the multinomial logit (MNL) model, a discrete choice model. It also shows that the algorithm is efficient in terms of the number of observations needed

Illinois Digital Environment for Access to Learning and Scholarship Repository

Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity

Author: Dattatreya Yesh
Fanti Giulia
Hou Charlie
Sanghavi Sujay
Shavlovsky Michael
Thekumparampil Kiran Koshy
Publication venue
Publication date: 09/11/2023
Field of study

While deep learning (DL) models are state-of-the-art in text and image domains, they have not yet consistently outperformed Gradient Boosted Decision Trees (GBDTs) on tabular Learning-To-Rank (LTR) problems. Most of the recent performance gains attained by DL models in text and image tasks have used unsupervised pretraining, which exploits orders of magnitude more unlabeled data than labeled data. To the best of our knowledge, unsupervised pretraining has not been applied to the LTR problem, which often produces vast amounts of unlabeled data. In this work, we study whether unsupervised pretraining of deep models can improve LTR performance over GBDTs and other non-pretrained models. By incorporating simple design choices--including SimCLR-Rank, an LTR-specific pretraining loss--we produce pretrained deep learning models that consistently (across datasets) outperform GBDTs (and other non-pretrained rankers) in the case where there is more unlabeled data than labeled data. This performance improvement occurs not only on average but also on outlier queries. We base our empirical conclusions off of experiments on (1) public benchmark tabular LTR datasets, and (2) a large industry-scale proprietary ranking dataset. Code is provided at https://anonymous.4open.science/r/ltr-pretrain-0DAD/README.md.Comment: ICML-MFPL 2023 Workshop Ora

arXiv.org e-Print Archive